A modified weighted chimp optimization algorithm for training feed-forward neural network

Swarm intelligence algorithms (SI) have an excellent ability to search for the optimal solution and they are applying two mechanisms during the search. The first mechanism is exploration, to explore a vast area in the search space, and when they found a promising area they switch from the exploration to the exploitation mechanism. A good SI algorithm can balance the exploration and the exploitation mechanism. In this paper, we propose a modified version of the chimp optimization algorithm (ChOA) to train a feed-forward neural network (FNN). The proposed algorithm is called a modified weighted chimp optimization algorithm (MWChOA). The main drawback of the standard ChOA and the weighted chimp optimization algorithm (WChOA) is they can be trapped in local optima because most of the solutions update their positions based on the position of the four leader solutions in the population. In the proposed algorithm, we reduced the number of leader solutions from four to three, and we found that reducing the number of leader solutions enhances the search and increases the exploration phase in the proposed algorithm, and avoids trapping in local optima. We test the proposed algorithm on the Eleven dataset and compare it against 16 SI algorithms. The results show that the proposed algorithm can achieve success to train the FNN when compare to the other SI algorithms.


Introduction
Using machine learning approaches, classification [1,2], function approximation, patter recognition [3], prediction [4] and others [5,6] have become common applications in a variety of academic subjects [7]. Groundwater management problems, data mining, climatic and environmental problems, pharmaceuticals, engineering design issues, image segmentation, power flow, solar PV modules, and other topics are considered to be the most well-known applications that used neural networks to solve [8][9][10]. Artificial neural networks (ANN) are undoubtedly ranked among the most reputable methods in this field, that have been extensively used to solve various issues. ANN [11][12][13] is inspired by non-parametric mathematical models of physiological neural networks [14]. In the subject of Computational Intelligence, ANN are one of the most important inventions. They typically handle classification problems by stimulating neurons in the human brain [15][16][17]. On 1943, the first primitive conceptions of NNs were developed [18]. Feed-forward network [19], Radial basis function (RBF) network [20], Recurrent neural network [21], and convolutional neural network(CNN) [22]. The FNN is the most popular among them due to its straightforward design and effective functionality [23][24][25]. ANN have a high level of performance and simple to implement, and they can capture the hidden relationship between the inputs. Furthermore, ANNs can be implemented in parallel architectures and have excellent scalability, thus they can benefit from current technological breakthroughs in this situation [26,27]. ANNs have a remarkable ability to tackle difficult problems such as function approximation [28], data classification [29], image recognition [30], control of nonlinear systems modelling [31], and environmental forecasting [32]. The ability to learn is one of the most important qualities of an ANN. ANN can be changed by modifying its structure. There are four main learning procedures for neural networks: supervised learning [33], unsupervised learning [34], reinforcement learning [35], and meta-heuristic learning [36].
When the problem outputs are known in advance, such as in pattern recognition and classification tasks, supervised learning is utilised. The back-propagation (BP) method [37], which is a gradient-based technique, is a typical supervised learning strategy used in ANN. Slow convergence and premature convergence to local optimum are two shortcomings of BP that make it unsuitable for practical applications [38,39].
When the outputs are missing or uncertain, unsupervised learning is used. Text categorization and clustering applications typically use unsupervised learning [40]. Reinforcement learning on the other hand is utilised when the problem has a complex stochastic structure and is difficult to evaluate, such as control optimization problems. Meta-heuristic algorithms [41,42] are search strategies for locating a sufficiently good solution to optimization issues. Meta-heuristic learning can estimate optimal or semi-optimal connection weights for ANN with a lower chance of getting stuck in the many local optima in the search space [43]. ANN have been trained using a variety of meta-heuristic learning algorithms [44], including the Genetic Algorithm (GA) [45], Particle Swarm Optimization (PSO) [46], Evolutionary Strategies (ES) [47], Ant Colony Optimization (ACO) [48], Cuckoo Search (CS) [49], Firefly Algorithm (FA) [50], Population-Based Incremental Learning (PBIL) [51], Differential Evolution (DE) [52], Artificial Bee Colony (ABC) [53], and many other algorithms. The well-known No Free Lunch theorem (NFL) [54][55][56] has demonstrated that no superior meta-heuristic algorithm exists can perfectly learn ANN and handle all types of issues. Local optimization is probably reduced by GA although it converges more slowly. In applications that call on real-time processing, it performs poorly. ABC requires complex computations. The ES algorithm performs poorly since it was built using different mutation techniques. Evolutionary algorithm use of mutation preserves population diversity and encourages exploitation, which is one of the primary causes of ES subpar performance. Additionally, this programme uses a deterministic method for selecting individuals. As a result, choosing a person is less random, and local optima avoidance is also less random. GWO fall into the trap of local optimization despite their low complexity and quick convergence, hence they are unsuitable for issues involving local optimization. The SSA algorithm's intricacy and abundance of regulatory parameters are two of its flaws. DE is unsuitable for real-time use due to its numerous control settings and time-consuming calculations. Actually, the primary driving force behind this work is the fact that existing multi-solution stochastic trainers are still susceptible to local optima stagnation. As a result of all of these factors, many academics are turning to other meta-heuristic approaches to train ANN. The imbalance between the two phases of exploration and extraction is the primary cause of local optimizations getting stuck. In order to address the two main problems of slow convergence and trapping in local optima when solving optimization problems, this paper proposes a Modified Weighted ChOA (MWChOA) for training a multi-layer feed-forward neural network model. The solutions in the proposed algorithm update thier positions based on the position of the three leader solutions instead of the four leader solutions in the standard ChOA algorithm. The applied modification in the proposed algorithm achieves the balance between exploration and exploitation and can help it avoids trapping in local optima.
The main contribution of this paper is as following: 1. A new modified version of the standard ChOA and the WChOA are introduced to find the optimal weights and bias in the FNN.
2. We reduced the number of the leader solutions from Four to three to balance between the exploration and the exploitation processes and avoid stuck in local optima.
3. The proposed algorithm is tested on the Eleven benchmark dataset and compared against 16 SI algorithms.
The remaining sections of the paper are arranged as follows: Section 2 of the paper introduces several relevant publications in the recent work. The description and structure of a multilayer feedforward neural network (FNN) are introduced in Section 3. The Proposed Algorithm MWChOA is introduced in Section 4. The experimental result are reported in Section 5. Section 6 summarises the content of this work and offers some suggestions for future research.

Related work
In the recent decade, ANN learning has gotten a lot of attention as a way to increase the efficiency of ANN modelling outputs. Many researchers have successfully trained neural networks using various well-known metaheuristic optimization techniques. In this section, we will present a review of some studies using metaheuristic optimization techniques for training neural networks.
One of the first meta-heuristic algorithm for training feed-forward neural networks was the Genetic Algorithm (GA) [57]. Some researchers have used enhanced GA to train neural networks [58]. The weights and network topology of the MLP networks were evolved using particle swarm optimization (PSO) in [59]. Based on, the educational approach was modified PSO outperformed other optimizers in terms of accuracy. Other researchers have used modified PSO in studies like [60]. In [46], a hybrid approach combining a PSO optimizer to understand MLP, back-propagation was suggested. The technique was evaluated utilising various data classification issues and used to the learning of MLP networks. PSO algorithm was used to train feed-forward neural networks by Ismail and other researchers in 2005 [61].
To address the continuous optimization, the authors presented the Ant Colony Optimization algorithm (ACO) in [62]. The ACO was also integrated with other gradient-based techniques, including Levenberg-Marquardt and back-propagation algorithms. Socha et al. trained a feed-forward neural network using the ant colony optimization (ACO) algorithm in 2007 [63]. Karaboga and other researchers employed artificial bee colonies (ABC) and enhanced ABC algorithms to train feed-forward neural networks between 2007 and 2011 [64]. In 2019, Ghorbani et al. [65] trained feed-forward neural networks using an improved gravitational search algorithm (GSA). In 2011, Mirjalili et al. utilised the magnetic optimization algorithm (MOA) to train feed-forward neural networks, and in 2015, they employed the grey wolf optimizer (GWO) to train them [66,67]. To train a feed-forward neural network, Pu et al. employed a novel hybrid biogeography-based optimization technique in 2018 [68]. To train a feed-forward neural network, Zhao et al. used a selfish herds optimization technique with orthogonal architecture and information updating in 2019 [69]. To train a feed-forward neural network, Xu J et al. employed a hybrid nelder-mead and dragonfly algorithm in 2019 [70]. In their paper [71], Goerick et al. proposed a novel MLP learning mechanism based on the Evolution Strategy (ES). Ilonen et al. used the differential evolution (DE) optimization method to improve the MLP learning process in [52]. Aljarah, et al. trained neural network using the bird swarm algorithm (BSA) algorithm in 2019 [72]. In 2021 Sağ et al.used vortex search optimization algorithm for training feed-forward neural network [73].In 2022 Chatterjee et al. proposed a Chaotic oppositional-based whale optimization to train a feed forward neural network [74]. In 2022 Gülcü [75] trained feed-forward neural networks using dragonfly algorithm. In 2023 KUMAR [76] use ZEALOUS-PSO to train multilayer perceptron neural networks. Emambocus in 2023 made a Survey on training neural network using different types of optimization algorithms [77]. And others researchers have recently been using recent-swarm intelligence algorithms for training feedforward neural networks [78][79][80][81][82][83][84].

Feed forward neural network (FNN)
The multilayer feed-forward neural network structure is composed of three layers: an input layer, a hidden layer, and an output layer. The hidden layer may consist of one or more layers, and neurons are arranged parallel to one another in each layer [85]. A feed-forward neural network (FNN) with three layers is seen in Fig 1. There is just one hidden layer, and there are h hidden layer nodes in total. The output layer has m nodes. The input layer contains n input nodes. Weight [86] is the one-way connection of nodes between adjacent layers. The FNN depicted in Fig 1 is one in which the hidden layer receives n data from the input layer after which they are multiplied by their corresponding weights. As the neurons go through the hidden layer, the sigmoid function processes them as the hidden layer's output value and multiplies them with the appropriate weights as input to the output layer. The sigmoid transfer function and output are used to calculate the output layer.
First, the total weighted of the input layer is calculated using Eq 1: Where n is the number of input nodes, W ij , displays the connection weight from node i th in the input layer to j th node in the hidden layer h, θ j is the bias of the hidden node j th , and X i denotes the input of i th . The deviation (threshold value) of the first hidden layer and the first input is among them and is represented by the node from the first node to the first hidden layer of the input layer. The output value of the hidden layer node is then determined by Eq 2: using the sigmoid function.
Using Eqs 3 and 4 to input values to the output layer in the same way as the hidden layer's output: W jk is the connection weight from the j th hidden node to the k th output node, and k is the k th output node's bias (threshold). The relation weights and bias values are the most critical aspects of the FNN, as shown in Eqs 1, 2, 3 and 4 and they define the final output value. The main aim of FNN training is to find the best weights and bias values for a given input and achieve the most ideal output.

The proposed MWChOA algorithm
We describe the main structure of the proposed MWChOA algorithm in the following subsections.

Chimp optimization algorithm (ChOA)
In the following subsections, we highlight the social life, inspiration, and structure of the standard ChOA as follows.

Social life and inspiration.
The Chimps are a kind of ape and they are considered one of the most intelligent animals in the world due to their big brain relative to their body ratio. They are living in groups. Each group tries to discover the environment (search space) with different strategies. In a chimp group, each individual has a different method of hunting. There are four types of individuals that are responsible for the hunting in the group. These individuals are called drivers, barriers, chasers, and attackers. The drivers are responsible for pursuing the prey without catching it. The barriers are constructing a dam to avoid the progression of the prey. The chasers pursue the prey rapidly to catch up with it. In order to make the prey return to the location of the chasers, the attackers finally foresaw the getaway route. The other individuals in the group follow the four leaders (drivers, barriers, chasers, and attackers) to hunt the prey by updating their position based on the leader's position. The chimps have a distinct social behavior in the final stage of hunting which is a sexual motivation by leaving their hunting duties and trying to search for food randomly.

Chimp optimization algorithm implementation.
The Chimp Optimization Algorithm (ChOA) is a recent natural inspired algorithm, which simulates the chimp's life and their behavior in the hunting process. The ChOA is proposed in 2020 by M.Khishe et al [87]. In this subsection, we simulate the social behavior of the chimps by specifying the mathematical model of the chimp optimization algorithm (ChOA) as follows. The population in the ChOA is different than the other swarm intelligence algorithms. It contains four groups which are drivers D, barriers B, chasers C, and attackers A. The pray represents the optimal solution, however, it is hard to know its location in the search space. During the search, The four leaders are assigned as follows.
• The attacker individualX A . The attackers' group represents the exploitation process in the ChAO. The individualX 1 represents the first solution in the group, while theD A represents the distance between the position of the current solutionX and the position of the attacker solutionX A as shown in the following equations.
• The barrier individualX B . The barrier group participates with the chaser and the barrier groups in the exploitation process in the ChOA. The individualX 2 represents the second solution in the group, while theD B represents the distance between the position of the current solutionX and the position of the barrier solutionX B as shown in the following equations.D • The chaser individualX C . The chaser group is part of the exploitation process in the ChOA. The individualX 3 represents the third solution in the group, while theD C represents the distance between the position of the current solutionX and the position of the chaser solutioñ X C as shown in the following equations.
• The driver individualX D . The driver group is a stage in the exploitation process in the ChOA. The individualX 4 represents the fourth solution in the group, while theD D represents the distance between the position of the current solutionX and the position of the driver solutionX D as shown in the following equations.
The vectorsÃ;C, andM have a great effect on the algorithm performance and their values are calculated as shown in the following subsection.

The main paraments of the algorithm.
The ChOA has three main parameters, these parameters are the vectorsÃ,C, andM, and they are calculated as follows. The vectorÃ is responsible for switching from the exploration to the exploitation phases. The value ofÃ lies in the range [−2f, 2f].Ã The vectorC is a random vector in the range [0, 2]. It is applied in the ChoA to increase the diversity of the algorithm and help it to escape from local optima.
The vectorM represents the sexual motivation in the ChOA and it is computed based on the chaotic map.M

¼ Chaotic value ð11Þ
Where the vectorsr 1 andr 2 are random vectors in [0, 1]. f is a control value that is reduced linearly from 2.5 to 0.

The exploration and the exploitation phases.
The exploration phase is handled by the driver, chaser, and barrier solutions, while the exploitation phase is handled by the attacker solution. According to Eq 12. the vectorÃ has a value in the range of [−1, 1], and the ChOA algorithm is compelled to move from the exploration to the exploitation phases based on this value. The impact of the vectorÃ on the exploration and exploitation phases is depicted in Fig  2. Phase ¼ 4.1.5 The solution updating process. The individuals in the population update their position based on the values of the four leaders' individuals (attacker, driver, chaser, and barriers). This process can be formulated as follows.
Where t is the current iteration, the vectorsX 1 ,X 2 ,X 3 , andX 4 are calculated in Eqs 5, 6, 7 and 8. Fig 3 shows the individuals updating process in the ChOA.

The sexual motivation process.
In the final stage of the hunting process, some chimps release their duties and they try to get the food randomly. This situation can be simulated in the ChOA to accelerate the convergence and avoid trapping in local optima. In the ChOA, the value of the parameter μ is responsible for switching between the normal updating position and the chaotic updating positions for all individuals in the population. This process can be formulated as follows.  3: Initialize the population X randomly, whereX ðtÞ i , i = 1, . . ., N. 4: Calculate the fitness function of each individualX ðtÞ i . 5: Assign the attacker X A as shown in Eq 5. 6: Assign the barrier X B as shown in Eq 6. 7: Assign the chaser X C as shown in Eq 7. 8: Assign the driver X D as shown in Eq 8. 9: repeat 10: for i = 1 to N do 11: Generate a random number r 1 , and r 2 2 [0, 1]. 12: Update the values of the vectorsM as shown in Eq 11. 13: Update the values of the vectorsC as shown in Eq 10. 14: Update the values of the coefficient f. 15 Update the values of the vectorsM,Ã,C and the coefficient f. 27: Update attacker X A as shown in Eq 5. 28: Update the barrier X B as shown in Eq 6. 29: Update the chaser X C as shown in Eq 7. 30: Update the driver X D as shown in Eq 8.

The structure of the proposed MWChOA algorithm
In the standard ChOA, the positions of the other chimpanzees are only updated using the first four ChOA solutions: driver, chaser, attacker, and barrier. Instead, the four best selections attract the other chimps' attention (driver, chaser, barrier, and attacker). Despite the fact that attackers can naturally predict the course of their prey's evolution, there is no assurance that their strategy will always be the best because chimpanzees occasionally abandon their duties while hunting or continue to do so throughout the process. If the position of other chimps is updated based on the attackers, they may become stuck in the local optima and be unable to explore new parts of the search space since their solution space is extremely concentrated around the attacker's solutions. The greatest alternatives have similar justifications as well (driver, chaser, barrier). To increase the algorithm's convergence speed, we used three first three leader solutions (chasers, an attacker, and a barrier). Eqs 15, 16 and 17 are used instead of 5, 6, 7 and 8.D Then, in order to hasten convergence and enhance exploration and exploitation, a position-weighted equation based on weights is created [88]. Use equations (1) through (3) to update additional chimpanzee locations (3). Other chimpanzees are essentially compelled to alter their positions in accordance with those of the pursuer, attacker, and barrier. In light of the preceding justifications, it becomes possible to come up with fresh approaches to changing other chimpanzees' perspectives. The weighting strategy proposed below is based on the step size's Euclidean distance.
where the learning rates from the attacker, barrier, and chaser are denoted by W1, W2 and W3 respectively. Additionally,|.| displays the Euclidean distance. But the position-weighted connection looks like this: Instead of using Eq 13 in the traditional ChOA, the position-weighted relationship Eq 21 can be used in MWChOA. It should be clear that applying the appropriate learning rate is the primary distinction between Eq 21 and the conventional position-weighted relationship, Eq 13. Consequently, the relationship shown below is used.
In other words, the attacker, barrier, and chaser define where the other chimpanzees will eventually be located at random: in a circle around the victim. for i = 1 to N do 10: Generate a random number r 1 , and r 2 2 [0, 1].

11:
Update the values of the vectorsM as shown in Eq 11. 12: Update the values of the vectorsC as shown in Eq 10.

13:
Update the values of the coefficient f. 14: Update the value of the vectorÃ as shown in Eq 9.

Simulation platform
All algorithms are tested in Matlab R2018a and run on a PC with an Intel(R) Core(TM) i7-9700 processor 3.00 GHz, 8 GB of RAM, and Windows 10.

Datasets
The proposed MWChOA is tested on Eleven benchmark datasets from the UCI repository [89]. Table 1 displays the chosen datasets and their attributes and classes.

XOR dataset.
A well-known nonlinear benchmark problem is the N-bit XOR problem. The goal is to figure out how many "1"s there are in the input vector. The input vector's XOR result should be returned; if the input vector has an odd number of "1 s," the output should be "1." The output is "0" if the input vector contains an even number of "1 s" [90]. There are three inputs, eight training test samples, and one output in the XOR dataset. To tackle this problem, we employ the 3-7-1 FNN structure 5.2.2 Balloon dataset. This set of data is based on balloon inflation experiments conducted under a variety of situations. This data set has 16 instances and 16 test sets, each with four attributes: colour, size, action, and age, as well as four inputs and one output that shows whether the balloon is inflated or not [91]. To categorise this dataset, we utilise a neural network organised 4-9-1.

Breast cancer dataset. William H. Wolbergby of the University of Wisconsin-Madison
Hospital developed this dataset. The goal of this collection is to use photographs to determine whether a patient has cancer. This dataset's classification and recognition are really important. The data collection contains 699 occurrences and 9 attributes, including clump thickness, cell size uniformity, cell shape uniformity, and edge adhesion. The output is 2 if the classification recognition result is a benign tumour, and 4 if the classification recognition result is a malignant tumour [92]. As a result, the 9-19-1 FNNs architecture is used to classify this dataset.

Iris dataset.
The iris data set is used to categorise various iris types, which are split into three categories: Setosa, Versicolor, and Virginica. There are three outputs, in other words. There are 150 examples with four different characteristics: sepal length, sepal width, petal length, and petal width [93], totaling four inputs. As a result, for training and solving, this data set employs FNNs with a structure of 4-9-3.

Heart dataset.
This data collection is utilised for cardiac single proton computed tomography (SPECT) image diagnosis recognition and classification. The classification results indicate if the output patient is normal or abnormal [94]. All features are in binary format. This data set is really complex. It has 22 features and 22 inputs, including 80 data examples and 187 test cases. For training, we create a neural network with a structure of 22-45-1. In Table 2, the experimental findings are displayed. The classification set's hardest training dataset is the heart. 5.2.6 Hepatitis dataset. the hepatitis database categorises whether the output patient is alive or dead [95]. 19 characteristics and 19 inputs are present. We build a neural network with the structure 19-39-1 for training FNN.

Haberman dataset.
The dataset includes data from a research on the prognosis of breast cancer patients who underwent surgery that was carried out at the University of Chicago's Billings Hospital between 1958 and 1970 [96]. The classification results show whether the patient died within 5 years or survived for 5 years or more. this data set employs FNNs with a structure of 3-7-1.

Liver dataset.
All of the blood tests that make up the attributes of this dataset are thought to be sensitive to liver diseases that may be caused by drinking too much alcohol [97]. One single male person's record is contained in each line of the dataset. We create a neural network with a structure of 6-13-1.

5.2.9
Ionosphere dataset. This dataset consists of radar data that was gathered by a device in Goose Bay, Labrador [98]. With a total transmission power of around 6.4 kilowatts, this system comprises of a phased array of 16 high-frequency antennas. In the ionosphere, free electrons were the intended targets (good or bad). A structure of some kind in the ionosphere can be seen in "good" radar signals. The ones that don't have their transmissions cut through the ionosphere are considered "bad" returns. The structure of FNN is 34-69-1.

Lung cance dataset.
The Lung dataset is a large dataset that includes all of the study data accessible for analysis on lung cancer screening, incidence, and death [99]. We build a neural network with the structure 56-113-1 for training FNN.

Pima dataset. The National Institute of Diabetes and Digestive and Kidney
Diseases is the original source of this dataset [100]. The dataset's goal is to diagnostically classify whether or not a patient has diabetes. The 9-19-1 FNNs structure used to classify this dataset.

Parameter setting
The parameters setting of the proposed algorithm and other algorithms are shown in Tables 3  and 4.

The MWChOA for training FNN
• The initial population. The initial population contains weights and biases, which are generated randomly as shown in Eq 23.
V ¼ fW ;ỹg ¼ fW 11 ; W22; . . . ; W nn ; y 1 ; y 2 ; . . . ; y n g ð23Þ • The fitness function. The proposed algorithm uses the mean square error (MSE) to evaluate the obtained results by subtracting the desired results from the actual results as shown in Eq 24. The main goal is to minimize the MSE to obtain the best solution as shown in Eq 26.
where d k i is the desired output of the ith input unit when the kth training sample is utilised, and o k i is the actual output of the ith input unit when the kth training sample occurs in the input, where m is the number of outputs. In datasets, there is always more than one training sample. As a result, all training samples should be checked for FNN. In these situations, the MSE average over all training samples is as follows: The number of training samples is m, and the number of outputs is s [59]. • The hidden layer nodes h. The structure of FNNs is also important in the experimental setting, and we utilise the number of hidden layer neurons for datasets as follows: where n denotes the number of inputs and h denotes the number of hidden nodes.
• The evaluated metrics. Classification metrics assess the effectiveness of the proposed algorithm for training FNN and determine how accurate the classification is [101]. Accuracy Shows how many cases are completely and correctly classified. It is derived by dividing the total number of predictions by the number of accurate predictions. It is calculated by Eq 28.
Where true positive T p is a positive-class sample that has been correctly categorised. False positive F p sample is one that should have been labelled as negative but was instead classified as positive. True Negative T N refers to a correctly classified negative-class sample. false negative F N sample is one that should have been labelled as positive but was incorrectly classified as negative.
Recall The true positive rate (TPR), hit rate, or recall of a classifier represents the proportion of correctly identified positive samples to the total number of positive samples and is calculated using Eq 29.
Precision represents the proportion of accurately identified positive samples to the total number of positive expected samples as mentioned in Eq 30.
The F 1 -score is the harmonic mean of precision and recall. F-measure values range from zero to one, with higher values indicating better classification ability and is calculated using Eq 31.

Time complexity of the proposed MWChOA
The time complexity of the proposed algorithm is calculated based on the population size N, the problem dimension n and the maximum number of iterations t as follows.
• Initialize the parameters. The time complexity for initializing the parameters such asÃ,M, C, the coefficient f is constant C.
• Initialize the population. The population contains N solutions and n variables. The time complexity to generate the initial population is O(N × n).
• Update the Solutions. The time complexity to update the solutions in the population is O(N × n).
• Evaluate the fitness function. The time complexity to evaluate the initial population is O(N × n).
The overall time complexity is O(C × t + N × n × t), where t is the maximum iteration number.

The performance of the proposed MWChOA algorithm
The proposed MWChOA algorithm is a modified version of the standard ChOA and the WChOA. In order to verify it efficiency, we compare it against these two algorithms by calcuating the four evaluation metrics (accuracy, precision, recall and F1-score). The results are reported in Tables 6 and 7, the overall best solution is reported in bold text.
Also, we plot the convergence curve of the proposed algorithm and the ChOA and WChoA to ensure the efficiency of the proposed algorithm as shown in

The comparison between MWChOA and other algorithms
We applied another experiment to test the efficiency of the proposed MWChOA algorithm on the most used Five datasets such as XOR, balloon, breast cancer iris, and heart by comparing it against 16 SI algorithms such as GWO [86], ABC [102], HS (Harmony Search) [103], GGSA (Gbest-guided Gravitational Search Algorithm) [104], DE (Differential Evolution) [105], PSO [86], GA [86], ACO [86], ES [86], PBIL [86], SSA [106], and SSO [107]. The average (AVE), the standard deviations (STD), and the classification rate (CR%) are reported in Tables 8-11 for all algorithms after 10 runs. The overall results are reported in bold text. Also, the AVE, STD, and CR are plotted for all algorithms in Figs 16-30. The results in the tables and figures show that the proposed algorithm produces good results in most cases.

The Wilcoxon test
We conducted a statistical test experiment of the Wilcoxon test p-value in order to improve the performance evaluation of the optimization method [108]. A non-parametric test is run at a significance threshold of 5% to see if the MWChOA findings differ from the best outcomes of the other algorithms used in the statistical technique. Table 12 displays the p values for all algorithms. A p-value of less than 0.05 is typically regarded as being sufficient support for the null hypothesis. Table 12 above demonstrates that only the Iris dataset and the p-value of ABC

Conclusion and future works
Swarm intelligence algorithms (S) have been applied to solve many real-world problems. One of these problems is training the feed-forward neural network (FNN). However most of these algorithms suffer from slow convergence and are stuck in local optima. To overcome these issues, we propose a modified version of the standard Chimp Optimization algorithm (ChOA). The proposed algorithm is called a modified weighted chimp optimization algorithm (MWChOA). The proposed algorithm uses three leaders solution instead of four leaders in the standard ChOA and the weighted chimp optimization algorithm (WChOA). Reducing the number of the selected leaders' solutions in the proposed MChOA improves the results and increases the accuracy of the obtained results. To test the efficiency of the proposed MWChOA, we test it on Eleven benchmark datasets and compared it against 16 SI algorithms.
In future work, we will apply the proposed algorithm to train the most known deep learning